12. DO: Lab

Introduction

Arpan mentioned three areas of biologically inspired robots: actuation, vision/behavior and controls. So far activation and vision have been discussed. Now it is time to investigate control.

Robot control is an interesting topic. When building robots (especially expensive ones like MITs cheetah) it is not really cost effective to test every possible configuration or series of motion on the robot. This can be very time consuming (the number of configurations is too large to explore all options) and costly (unsuccessful trials may damage the robot, not to mention the normal deterioration based on use). As a result, roboticists (and most scientists) settle for simulations. Simulations provide good insight into how the system will likely behave in the real-world (the word likely is used here since hardware typically behaves different than simulation) and can (usually) be run faster to cover more scenarios without damage to hardware.

Creating simulators capable of handling multiple biologically inspired robots is time consuming. Luckily, the folks working on OpenAI have put together a great tool called Roboschool. Roboschool is an open-source robotics software simulation that integrates with OpenAI's Gym. Gym is a platform to test and benchmark reinforcement learning algorithms.

Reinforcement Learning and Control

As Arpan mentioned, reinforcement learning is an important technique for testing robot controls through a trial and error method with an incentive (similar to a baby learning how to walk or crawl; various configurations are tried until the child reaches its goal of moving from one location to the next efficiently). Reinforcement learning typically consists of a model with:

  • a set of environment and agent states **S**
  • a set of actions **A** of the agent
  • policies of transitioning from states to actions
  • rules that determine the *scalar immediate reward* of a transition
  • rules that describe what the agent observes

As the agent discovers its environment a reward function tells the agent how well that new state, action, transition combination is worth. The higher the reward, the more the agent should try to achieve these results, the lower the reward, the less likely the agent will be to explore these options.

In the case of our biologically inspired robots, the state is the position of the biologically inspired robot (i.e. the physical location in the environment) the action is the movement of the biologically inspired robot (i.e. how the robot moves it's body from one position to the next). The reward is typically based on the amount of energy required to achieve the configuration along with the amount of locomotion generated (i.e. how far the robot traveled with the amount of energy exerted to move). The greatest rewards are the rewards that generate the largest amount of locomotion with the minimal amount of energy (activiation) required.

OpenAI Gym and Roboschool

The reinforcement algorithms used in OpenAI Gym will try various configurations in succession as long as the robot keeps moving forward. The configurations tried will be awarded a score. Once the simulated robot falls or the robot achieves locomotion for a said time, the simulation is done, the score is tallied, and a new simulation is started. Typically, successful configurations are built upon while unsuccessful simulations are discarded. Over time, the robot will "learn" how to keep itself upright while attempting to maximize locomotion while minimizing energy exerted.

For this lab, you will download and play around with OpenAI Gym and Roboschool. We won't get into many of the specifics for reinforcement learning here, but you are encouraged to investigate them further on your own and discuss them with the community (this is highly encouraged). If you would like to learn more about reinforcement learning, Arthur Juliani does a great job of providing some context and applications for reinforcement learning.